MeLos: Analysis and Modelling of Speech Prosody and Speaking Style

نویسنده

Nicolas Obin

چکیده

This thesis addresses the issue of modelling speech prosody for speech synthesis, and presents MeLos: a complete system for the analysis and modelling of speech prosody “the music of speech”. Research into the analysis and modelling of speech prosody has increased dramatically in recent decades, and speech prosody has emerged as a crucial concern for speech synthesis. The issue of speech prosody modelling is to model speech prosody variations depending on the context linguistic (e.g. linguistic structure), para-linguistic (e.g., emotion), or extra-linguistic (e.g., socio-geographical origins, situation of a communication). Modelling the variability of speech prosody is required to provide natural, expressive, and varied speech in many applications of high-quality speech synthesis such as multi-media (avatars, video games, story telling, dialogue systems) and artistic (cinema, theatre, music, digital arts) applications. The objective of the present study on the analysis and the modelling of speech prosody is to vary and adapt the strategy, alternatives, and speaking style of a speaker for natural, expressive, and varied speech synthesis. The objective of this thesis is to model strategies, alternatives, and speaking style of a speaker for natural, expressive, and varied speech synthesis. The present study presents original contributions that correspond to a special attention paid to the combination of theoretical linguistics and statistical modelling to provide a complete speech prosody system that can be used for speech synthesis. In particular, speech prosody characteristics are described in three linguistic levels from signal variations to abstract representations. A unified discrete/continuous context-dependent HMM is presented to model the symbolic and the acoustic characteristics of speech prosody. A rich description of the text characteristics based on a linguistic processing chain that includes surface and deep syntactic parsing is proposed to refine the modelling of the speech prosody in context. Segmental HMMs and Dempster-Shafer fusion are used to balance linguistic and metric constrains in the production of a pause. A context-dependent HMM is proposed to model the f0 variations based on the stylization and the trajectory modelling of short and long-term variations simultaneously over various temporal domains. The proposed system is used to model strategies and alternatives of a speaker, and is extended to the modelling of speaking style shared among speakers using shared-context-dependent modelling and speaker normalization techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prosodic analysis of storytelling discourse modes and narrative situations oriented to text-to-speech synthesis

The generation of synthetic speech with a certain degree of expressiveness has been successful for some particular applications or speaking styles (e.g. emotions). In this context, there is a particular speaking style with subtle speech nuances that may be of great interest for delivering expressive speech: the storytelling style. The purpose of this paper is to define a first step towards deve...

متن کامل

Prosody control for speaking and singing styles

By proper control of prosody, text-to-speech systems already have the capability to imitate distinctive speaking styles. We show two examples where we can capture the critical features: the singing style of Dinah Shore and the speaking style of Martin Luther King Jr. The styles are described by Stem-ML tags (soft template mark-up language), which offers the flexibility needed to control accent ...

متن کامل

The prosody of the TV news speaking style in Brazilian Portuguese

This study characterizes the prosodic structure of the TV news speaking style in Brazil and compares it to the speech of interview subjects on a television talk show. Fifteen distinct metrics, designed to characterize both temporal and melodic characteristics of speech, were evaluated on the two speaking styles. The results of the analysis show that the TV news speaking style is characterized b...

متن کامل

The Prosody of Excitement in Horse Race Commentaries

This study investigates examples of horse race commentaries and compares the acoustic properties with an auditorily based description of the typical suspense pattern from calm to very excited at the finish and relaxation after the finish. With the exception of tempo, the auditory impressions were basically confirmed. The examination shows further that the results of the investigated prosodic pa...

متن کامل

A Model for Varying Speaking Style in TTS systems

This paper aims to enhance the performance of a TTS system by generating various speaking styles. First we describe three speaking styles (Radio News, Political Address and Conversation) and compare the prosodic features found in these authentic styles with the prosody in “neutral” speech uttered by the eLite TTS system ([1]). Differences concern about 20 prosodic characteristics (F0 span, spee...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

MeLos: Analysis and Modelling of Speech Prosody and Speaking Style

نویسنده

چکیده

منابع مشابه

Prosodic analysis of storytelling discourse modes and narrative situations oriented to text-to-speech synthesis

Prosody control for speaking and singing styles

The prosody of the TV news speaking style in Brazilian Portuguese

The Prosody of Excitement in Horse Race Commentaries

A Model for Varying Speaking Style in TTS systems

عنوان ژورنال:

اشتراک گذاری